Bagging and Boosting statistical machine translation systems

نویسندگان

  • Tong Xiao
  • Jingbo Zhu
  • Tongran Liu
چکیده

a r t i c l e i n f o a b s t r a c t In this article we address the issue of generating diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Unlike traditional approaches, we do not resort to multiple structurally different SMT systems, but instead directly learn a strong SMT system from a single translation engine in a principled way. Our approach is based on Bagging and Boosting which are two instances of the general framework of ensemble learning. The basic idea is that we first generate an ensemble of weak translation systems using a base learning algorithm, and then learn a strong translation system from the ensemble. One of the advantages of our approach is that it can work with any of current SMT systems and make them stronger almost " for free ". Beyond this, most system combination methods are directly applicable to the proposed framework for generating the final translation system from the ensemble of weak systems. We evaluate our approach on Chinese–English translation in three state-of-the-art SMT systems, including a phrase-based system, a hierarchical phrase-based system and a syntax-based system. Experimental results on the NIST MT evaluation corpora show that our approach leads to significant improvements in translation accuracy over the baselines. More interestingly, it is observed that our approach is able to improve the existing system combination systems. The biggest improvements are obtained by generating weak systems using Bagging/Boosting, and learning the strong system using a state-of-the-art system combination method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Bagging and Additive Regression

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of regression models using the same learning algorithm as base-learner. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, in t...

متن کامل

Combining Bagging and Boosting

Bagging and boosting are among the most popular resampling ensemble methods that generate and combine a diversity of classifiers using the same learning algorithm for the base-classifiers. Boosting algorithms are considered stronger than bagging on noisefree data. However, there are strong empirical indications that bagging is much more robust than boosting in noisy settings. For this reason, i...

متن کامل

When Efficient Model Averaging Out-Performs Boosting and Bagging

Bayesian model averaging also known as the Bayes optimal classifier (BOC) is an ensemble technique used extensively in the statistics literature. However, compared to other ensemble techniques such as bagging and boosting, BOC is less known and rarely used in data mining. This is partly due to model averaging being perceived as being inefficient and because bagging and boosting consistently out...

متن کامل

Boosting-Based System Combination for Machine Translation

In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation sys...

متن کامل

Ensemble Methods for Environmental Data Modelling with Support Vector Regression

This paper investigates the use of ensemble of predictors in order to improve the performance of spatial prediction methods. Support vector regression (SVR), a popular method from the field of statistical machine learning, is used. Several instances of SVR are combined using different data sampling schemes (bagging and boosting). Bagging shows good performance, and proves to be more computation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Artif. Intell.

دوره 195  شماره 

صفحات  -

تاریخ انتشار 2013